Induction of comprehensible models for gene expression datasets by subgroup discovery methodology

نویسندگان

  • Dragan Gamberger
  • Nada Lavrac
  • Filip Zelezný
  • Jakub Tolar
چکیده

Finding disease markers (classifiers) from gene expression data by machine learning algorithms is characterized by a high risk of overfitting the data due the abundance of attributes (simultaneously measured gene expression values) and shortage of available examples (observations). To avoid this pitfall and achieve predictor robustness, state-of-the-art approaches construct complex classifiers that combine relatively weak contributions of up to thousands of genes (attributes) to classify a disease. The complexity of such classifiers limits their transparency and consequently the biological insights they can provide. The goal of this study is to apply to this domain the methodology of constructing simple yet robust logic-based classifiers amenable to direct expert interpretation. On two well-known, publicly available gene expression classification problems, the paper shows the feasibility of this approach, employing a recently developed subgroup discovery methodology. Some of the discovered classifiers allow for novel biological interpretations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis

Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...

متن کامل

Design and Construction of ctxB-gfp-stxB Gene Cassette and Investigation of Its Expression in E. coli Bl21 (DE3)

Background & Objective: In order to enhance the expression of soluble proteins and facilitate their purification and development of multi-functional polypeptide , chimerical recombinant proteins have been invented . The purpose of this study was to construct ctxB-gfp-stxB gene cassette to measure the uptake and excretion of chimerical antigen in future studies.   Materials & Methods: After prep...

متن کامل

Optimization and High Level Production of Recombinant Synthetic Streptokinase in E. coli Using Response Surface Methodology

Streptokinase (SK) is an extracellular protein comprising 414 amino acids with considerable clinical importance as a commonly used thrombolytic agent. Due to its wide spread application and clinical importance designing more efficient SK production platforms worth investigatinginvestigation. In this regard, a synthetic SK gene was optimized and cloned in to pET21b plasmid for periplasmic expres...

متن کامل

Relational Subgroup Discovery for Gene Expression Data Mining

We propose a methodology for predictive classification from gene expression data, able to combine the robustness of highdimensional statistical classification methods with the comprehensibility and interpretability of simple logic-based models. We first construct a robust classifier combining contributions of a large number of gene expression values, and then search for compact summarizations o...

متن کامل

Optimization and High Level Production of Recombinant Synthetic Streptokinase in E. coli Using Response Surface Methodology

Streptokinase (SK) is an extracellular protein comprising 414 amino acids with considerable clinical importance as a commonly used thrombolytic agent. Due to its wide spread application and clinical importance designing more efficient SK production platforms worth investigatinginvestigation. In this regard, a synthetic SK gene was optimized and cloned in to pET21b plasmid for periplasmic expres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of biomedical informatics

دوره 37 4  شماره 

صفحات  -

تاریخ انتشار 2004